Skip to content

gaussian grbm initialization#71

Open
jquetzalcoatl wants to merge 16 commits into
dwavesystems:mainfrom
jquetzalcoatl:feature/gaussian-rbm-init
Open

gaussian grbm initialization#71
jquetzalcoatl wants to merge 16 commits into
dwavesystems:mainfrom
jquetzalcoatl:feature/gaussian-rbm-init

Conversation

@jquetzalcoatl
Copy link
Copy Markdown

@jquetzalcoatl jquetzalcoatl commented Mar 16, 2026

grbm weights and biases initialization set to Gaussian N(0,1/number of nodes)

Hinton guide suggests 0.01 as standard deviation. See https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

Moreover, having it set to Gaussian with this dependence on the number of nodes makes the energy extensive and initializes the gRBM in a paramagnetic phase similar to that describen in the Random Energy model paper

https://journals.aps.org/prb/abstract/10.1103/PhysRevB.24.2613

See #48

@kevinchern
Copy link
Copy Markdown
Collaborator

@jquetzalcoatl IIRC, Hinton's recommendation pertains to zero-one-valued RBMs (bipartite with hidden units). Would it make sense to translate the $0.01$ to the spin-valued equivalent?

@jquetzalcoatl
Copy link
Copy Markdown
Author

@kevinchern The REM reference is for spin models i.e., {-1,1}. Ultimately, the initialization pertains to whether the model is ergodic. In this sense, the support only set an offset energy.

I believe the main motivation for initializing with 0.01 in Hinton's guide is to start in a paramagnetic phase, which ties nicely with the REM/SK spin glass model

Copy link
Copy Markdown
Collaborator

@kevinchern kevinchern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a release note to go with this?

Comment thread dwave/plugins/torch/models/boltzmann_machine.py Outdated
jquetzalcoatl and others added 2 commits March 16, 2026 16:04
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
@jquetzalcoatl
Copy link
Copy Markdown
Author

added release note

Copy link
Copy Markdown
Collaborator

@kevinchern kevinchern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following motivation from references, should h be initialized to 0?

Comment thread releasenotes/notes/gaussian-rbm-init-28fd4d295ef86d77.yaml Outdated
Comment thread dwave/plugins/torch/models/boltzmann_machine.py Outdated
Comment thread dwave/plugins/torch/models/boltzmann_machine.py Outdated
Comment thread dwave/plugins/torch/models/boltzmann_machine.py Outdated
jquetzalcoatl and others added 5 commits March 17, 2026 10:46
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@kevinchern kevinchern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing but otherwise LGTM. Thanks for the much-needed PR @jquetzalcoatl !!

@VolodyaCO offered to take a look at the tests

Comment thread dwave/plugins/torch/models/boltzmann_machine.py Outdated
Comment thread releasenotes/notes/gaussian-rbm-init-28fd4d295ef86d77.yaml Outdated
Comment thread releasenotes/notes/gaussian-rbm-init-28fd4d295ef86d77.yaml Outdated
jquetzalcoatl and others added 3 commits March 17, 2026 10:59
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
Co-authored-by: Kevin Chern <32395608+kevinchern@users.noreply.github.com>
@kevinchern kevinchern requested a review from VolodyaCO March 17, 2026 18:01
@kevinchern
Copy link
Copy Markdown
Collaborator

Any updates on this?

@VolodyaCO
Copy link
Copy Markdown
Collaborator

The reason for this test failing is very strange. Essentially, it is making sure that both the DVAE forward (which does encode -> latent to discrete -> decode) matches encode -> latent_to_discrete -> decode, i.e., this is a pretty simple unit test:

expected_latents = self.encoders[n_latent_dims](self.data)
expected_discretes = self.dvaes[n_latent_dims].latent_to_discrete(
    expected_latents, n_samples
)
expected_reconstructed_x = self.decoders[n_latent_dims](expected_discretes)

latents, discretes, reconstructed_x = self.dvaes[n_latent_dims].forward(
    x=self.data, n_samples=n_samples
)

assert torch.equal(reconstructed_x, expected_reconstructed_x)
assert torch.equal(discretes, expected_discretes)
assert torch.equal(latents, expected_latents)

Moreover, self.dvaes is built as

self.encoders = {i: Encoder(i) for i in latent_dims_list}
self.decoders = {i: Decoder(latent_features, input_features) for i in latent_dims_list}
self.dvaes = {i: DVAE(self.encoders[i], self.decoders[i]) for i in latent_dims_list}

So even if the encoders/decoders are updated in other tests (because of training), there should be a permanent tracking of the encoders/decoders in the dvaes.

@VolodyaCO
Copy link
Copy Markdown
Collaborator

Found the issue and fixed it in a PR to @jquetzalcoatl 's repo: jquetzalcoatl#1

Please approve javi, this would update the current PR and solve the issue.

Took me a while to get the error!

Fix failing forward method unit tests
Copy link
Copy Markdown
Collaborator

@VolodyaCO VolodyaCO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have definitely had to manually change the initialisation of GRBM weights whenever I use the GRBM. Thanks for this PR. I think it looks good to merge.

@kevinchern kevinchern self-requested a review April 13, 2026 17:08
Copy link
Copy Markdown
Collaborator

@kevinchern kevinchern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jquetzalcoatl I added a couple typo fixes, can you accept them?
The remaining questions/comments are for @VolodyaCO and should be good to merge after.

`Hinton's practical guide for RBM training<https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>`_, which recommends sampling
weights from a Gaussian distribution with mean 0 and standard deviation 0.01 (for zero-one-valued RBMs).
The scaling factor of :math:`1/\sqrt(N)` ensures that the energy functional remains extensive
and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_.
and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model <https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_.

features:
- |
Initialize ``GraphRestrictedBoltzmannMachine`` weights using Gaussian
random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N
random variables with standard deviation equal to :math:`1/\sqrt(N)`, where :math:`N`


torch.manual_seed(1234) # Set seed again to ensure that the sampling in the forward method
# is the same as in the expected_discretes
latents, discretes, reconstructed_x = self.dvaes[n_latent_dims].forward(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I asked this in the first review for DVAE and forgot, but why does this test call the
forward method explicitly? Calling the model directly is the recommended practice as it has several hooks on top of the forward method. @VolodyaCO
(this question/comment is unrelated to this PR)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember. We can change it.

torch.testing.assert_close(discretes, expected_discretes)
torch.testing.assert_close(reconstructed_x, expected_reconstructed_x)

assert torch.equal(reconstructed_x, expected_reconstructed_x)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VolodyaCO was this the fix to failing tests? Are these tests sensitive to the seed..?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is not sensitive to the seed. It's just that two calculations that were random-based and converged to the same result no longer converged to the same result with the new initialisation. This was a silent bug, as the two random-based calculations should have been using the same initial seed. If you change the seed to any other seed, it should work.

@@ -0,0 +1,8 @@
---
features:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More an upgrade rather than a feature, no?

Suggested change
features:
upgrade:

- |
Initialize ``GraphRestrictedBoltzmannMachine`` weights using Gaussian
random variables with standard deviation equal to :math:`1/\sqrt(N)`, where N
denotes the number of nodes in the GRBM. The weight-initialization strategy is grounded in `Hinton's practical guide for RBM training <https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf>`_, which recommends sampling weights from a Gaussian distribution with mean 0 and standard deviation 0.01 (for zero-one-valued RBMs). The scaling factor of :math:`1/\sqrt(N)` ensures that the energy functional remains extensive and initializes the GRBM in a paramagnetic regime, consistent with the `Sherrington-Kirkpatrick model<https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.35.1792>`_.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add some line breaks here, splitting the full paragraph on several lines.

self._linear = torch.nn.Parameter(0.05 * (2 * torch.rand(self._n_nodes) - 1))
self._quadratic = torch.nn.Parameter(5.0 * (2 * torch.rand(self._n_edges) - 1))
self._linear = torch.nn.Parameter(torch.zeros(self._n_nodes))
self._quadratic = torch.nn.Parameter(torch.randn(self._n_edges)/self._n_nodes**0.5)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extensive energy we need to scale by connectivity, not number of nodes. number of nodes is specific to dense models.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous defaults are not great, but they included a factor 5 to reflect an approximation to the device sampling temperature (Adv2/Adv single qubit freezeout temperature). In the new definition this is absent, and might be worth noting as a limitaiton of the default.

Copy link
Copy Markdown

@jackraymond jackraymond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nu_i = beta* (h_i + sum_j Jij s_j) controls the typical field (bias) of a variable at initialization, I think we want this to be O(1), i.i.d and high entropy. I think that is inline with the motivation for the pull request, but leads to some additional considerations. We also want h to be small compared to J, because we want to initialize outside of the weakly coupled regime ideally. h should be just large enough to break the macroscopic sign-symmetry (IMO).

I think we want a strongly coupled models, so h should be just large enough to break the symmetry and no more. I.e. the contribution from h should be O(1):
beta * sum_i h_i s_i ~ 1 which for random s implies beta h_i ~ O(1/sqrt(N)).

For extensive energy I think we require J to scale as 1/root(mean-degree). 1/sqrt(N) scaling is appropriate for dense models only.

We might want to think about putting in a beta value, that reflects the QPU. E.g. if single qubit freezeout temperature ~ 1/5 we would want to scale down by a factor 5 (I think current default scales the wrong way).

We might want to think about the fact that the J and h-distributions are bounded, so Gaussian is not the maximum-entropy choice. This is probably a technicality because the bounds turn out to be far from the initialization values, but we should certainly discuss the impacts of clipping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants